Machine Learning Process steps
1. Data assessment
To start, data feasibility should be checked — Do we even have the right data sets to run machine learning models on top? Do we get data fast enough to do predictions?
For example, restaurant chains(QSRs) with access to millions registered customers’ data. This sheer volume is enough for any ML model to run on top of it.
When the above data risks are mitigated, a data lake environment with easy and powerful access to a variety of required data sources should be set up. A data lake (in place of traditional warehouses) would save the team a lot of bureaucratic and manual overhead.
Experimentation with the data sets to ensure that the data has enough information to bring about the desired business change is crucial at this step. Also, a scalable computing environment to process the available data in a fast manner is a primary requirement.
When the data scientists have cleaned up, structured, and processed the different data sets, we strongly advise cataloging the data for leveraging in the future.
In the end, a strong and well-thought governance and security system should be put in place so that different teams in the organization can share the data freely.
2. ML Model and technology stack
Once the ML models are chosen, they should be run manually to test their validity. For instance, in the case of personalized email marketing — Are the promotion emails that are being sent bringing in new conversions or do we need to rethink our strategy?
Upon successful manual tests, the right technology has to be chosen. The data science teams should be allowed to choose from a range of technology stacks so that they can experiment and pick up the one that makes ML productionizing easier.
The technology chosen should be benchmarked against stability, the business use case, future scenarios, and cloud readiness. Gartner states that cloud IaaS is projected to grow at 24% YoY until 2022.
3. Smoothening Deployment
Standardizing the deployment process so that the testing and integration at different points become smooth is highly recommended.
Data engineers should focus on polishing the codebase, integrating the model (as an API endpoint or a bulk process model), and creating workflow automation so teams can integrate easily.
A complete environment with access to the right datasets and models is essential for any ML model’s success.
4. Post Deployment and Testing
The right frameworks for logging, monitoring, and reporting the results would make the otherwise difficult testing process manageable.
The ML environment should be tested in real-time and monitored closely. In a sophisticated experimentation system, test results should be sent back to the data engineering teams so that they can update the models.
For example, the data engineers can decide to overweight the variants that over-perform in the next iteration while underweighting the underperforming variants.
Negative or wildly wrong results should also be watched out for. The right SLAs need to be met. The data quality and model performance should be monitored.
The production environment would thus slowly stabilize.
5. Communication and People
Every ML model’s success hugely depends on clear communication between the various cross-functional teams involved so that risks are mitigated at the right step.
Data engineering and data science teams would have to work together to put an ML model into production. Data scientists are advised to have full control over the system to check in code and see production results. Teams might even have to be trained for new environments.
Transparent communication would save everyone effort and time in the end.
Conclusion:
In addition to all the above best practices in place, the machine learning model should be designed to be reusable and resilient to changes and drastic events. The best-case scenario is not to have all the recommended methods in place but to make specific areas enough mature and scalable so that they can be calibrated up and down as per the time and the business requirement.
Please email us if you have any further questions on putting Machine Learning models into production. For the full webinar recording on “Productionizing ML models at scale”